{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "VGGNet_implementation.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyPATUQ7NPJuGjsYjNm0Qw+c",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Machine-Learning-Tokyo/CNN-Architectures/blob/master/Implementations/VGGNet/VGGNet_implementation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ggaoY-cLBRSm",
        "colab_type": "text"
      },
      "source": [
        "# Implementation of VGGNet\n",
        "\n",
        "<br><span style=\"font-size:14pt;\">We will use the tensorflow.keras Functional API to build VGG</span>\n",
        "(https://arxiv.org/pdf/1409.1556.pdf)\n",
        "\n",
        "---\n",
        "\n",
        "In the paper we can read:\n",
        "\n",
        ">**[i]** “All hidden layers are equipped with the rectification (ReLU (Krizhevsky et al., 2012)) non-linearity.”\n",
        ">\n",
        ">**[ii]** “Max-pooling is performed over a 2 × 2 pixel window, with stride 2.”\n",
        "\n",
        "<br>\n",
        "\n",
        "We will also use the following Table **[iii]**:\n",
        "\n",
        "<img src=https://raw.githubusercontent.com/Machine-Learning-Tokyo/DL-workshop-series/master/Part%20I%20-%20Convolution%20Operations/images/VGG.png width=\"500\">\n",
        "\n",
        "---\n",
        "\n",
        "## Network architecture\n",
        "\n",
        "- The network consists of 5 *Convolutional* blocks and 3 *Fully Connected* Layers\n",
        "\n",
        "- Each Convolutional block consists of 2 or more Convolutional layers and a Max Pool layer\n",
        "\n",
        "---\n",
        "\n",
        "## Workflow\n",
        "We will:\n",
        "1. import the neccesary layers\n",
        "2. write the code for the *Convolution blocks* \n",
        "3. write the code for the *Dense layers*\n",
        "4. build the model\n",
        "\n",
        "---\n",
        "\n",
        "### 1. Imports"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DinjziABBD1j",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tensorflow.keras.layers import Input, Conv2D, \\\n",
        "     MaxPool2D, Flatten, Dense"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cvE99U2zBiIk",
        "colab_type": "text"
      },
      "source": [
        "### 2. *Convolution blocks*\n",
        "\n",
        "We start with the input layer:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RcHjItVhBfgV",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "input = Input(shape=(224, 224, 3))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nEWiqA4UCY_u",
        "colab_type": "text"
      },
      "source": [
        "#### 1st block\n",
        "\n",
        "from the paper:\n",
        ">- conv3-64\n",
        ">- conv3-64\n",
        ">- maxpool"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ttqOR85pBmOi",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = Conv2D(filters=64, kernel_size=3, padding='same', activation='relu')(input)\n",
        "x = Conv2D(filters=64, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dbrQrFxrCiY8",
        "colab_type": "text"
      },
      "source": [
        "#### 2nd block\n",
        "\n",
        "from the paper:\n",
        ">- conv3-128\n",
        ">- conv3-128\n",
        ">- maxpool"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "k1mflX6gCfmh",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = Conv2D(filters=128, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=128, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DyYJcx_dCnO4",
        "colab_type": "text"
      },
      "source": [
        "#### 3rd block\n",
        "\n",
        "from the paper:\n",
        ">- conv3-256\n",
        ">- conv3-256\n",
        ">- conv3-256\n",
        ">- maxpool"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "P3j-FKezCkv5",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = Conv2D(filters=256, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=256, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=256, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "__t-Sc5DCq2Y",
        "colab_type": "text"
      },
      "source": [
        "#### 4th and 5th block\n",
        "\n",
        "from the paper:\n",
        ">- conv3-512\n",
        ">- conv3-512\n",
        ">- conv3-512\n",
        ">- maxpool"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "h1ML3fW1CpCC",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)\n",
        "\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vzvBPfm3C03E",
        "colab_type": "text"
      },
      "source": [
        "### 3. Dense layers\n",
        "\n",
        "Before passing the output tensor of the last Convolutional layer to the first `Dense()` layer we flatten it by using the `Flatten()` layer.\n",
        "\n",
        "from the paper:\n",
        "\n",
        ">- FC-4096\n",
        ">- FC-4096\n",
        ">- FC-1000\n",
        ">- soft-max"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jxdZ084-DAiL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = Flatten()(x)\n",
        "x = Dense(units=4096, activation='relu')(x)\n",
        "x = Dense(units=4096, activation='relu')(x)\n",
        "output = Dense(units=1000, activation='softmax')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PYFd0pVuDR9f",
        "colab_type": "text"
      },
      "source": [
        "### 4. Model\n",
        "\n",
        "In order to build the *model* we will use the `tensorflow.keras.Model` object:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hb2PIaLvDPYF",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tensorflow.keras import Model"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Xzpjfp77DXbD",
        "colab_type": "text"
      },
      "source": [
        "To define the model we need the input tensor(s) and the output tensor(s)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "E8XXK9pFDX12",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "model = Model(inputs=input, outputs=output)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MZ3Grq3GDdiO",
        "colab_type": "text"
      },
      "source": [
        "## Final code"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "awScOJYtDabe",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tensorflow.keras.layers import Input, Conv2D, \\\n",
        "     MaxPool2D, Flatten, Dense\n",
        "\n",
        "input = Input(shape=(224, 224, 3))\n",
        "\n",
        "x = Conv2D(filters=64, kernel_size=3, padding='same', activation='relu')(input)\n",
        "x = Conv2D(filters=64, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)\n",
        "\n",
        "x = Conv2D(filters=128, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=128, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)\n",
        "\n",
        "x = Conv2D(filters=256, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=256, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=256, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)\n",
        "\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)\n",
        "\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = Conv2D(filters=512, kernel_size=3, padding='same', activation='relu')(x)\n",
        "x = MaxPool2D(pool_size=2, strides=2, padding='same')(x)\n",
        "\n",
        "x = Flatten()(x)\n",
        "x = Dense(units=4096, activation='relu')(x)\n",
        "x = Dense(units=4096, activation='relu')(x)\n",
        "output = Dense(units=1000, activation='softmax')(x)\n",
        "\n",
        "from tensorflow.keras import Model\n",
        "\n",
        "model = Model(inputs=input, outputs=output)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cANxHF-yDjBY",
        "colab_type": "text"
      },
      "source": [
        "## Model diagram\n",
        "\n",
        "<img src=\"https://raw.githubusercontent.com/Machine-Learning-Tokyo/CNN-Architectures/master/Implementations/VGGNet/VGGNet_diagram.svg?sanitize=true\">"
      ]
    }
  ]
}